Enterprise Database Systems
Statistics for Data Science #1
Data Science Statistics: Common Approaches to Sampling Data
Data Science Statistics: Inferential Statistics
Data Science Statistics: Simple Descriptive Statistics

Data Science Statistics: Common Approaches to Sampling Data

Course Number:
it_dssds1dj_02_enus
Lesson Objectives

Data Science Statistics: Common Approaches to Sampling Data

  • Course Overview
  • describe important terms associated with the sampling process
  • define sampling bias and describe problems caused by this phenomenon
  • define simple random sampling and enumerate its properties
  • define systematic random sampling and differentiate it from simple random sampling
  • define stratified random sampling and differentiate it from simple and systematic random sampling
  • define non-probability sampling and enumerate some non-probability sampling techniques
  • define the two properties of probability sampling, enumerate three types of probability sampling, and list two types of non-probability sampling

Overview/Description

Data science is an interdisciplinary field that seeks to find interesting generalizable insights within data and then puts those insights to monetizable use. In this 8-video Skillsoft Aspire course, learners can explore the first step in obtaining a representative sample from which meaningful generalizable insights can be obtained. Examine basic concepts and tools in statistical theory, including the two most important approaches to sampling—probability and nonprobability sampling—and common sampling techniques used for both approaches. Learn about simple random sampling, systematic random sampling, and stratified random sampling, including their advantages and disadvantages. Next, explore sampling bias. Then consider what is probably the most popular type of nonprobability sampling technique—the case study, used in medical education, business education, and other fields. A concluding exercise on efficient sampling invites learners to review their new knowledge by defining the two properties of all probability sampling techniques; enumerating the three types of probability sampling techniques; and listing two types of nonprobability sampling.



Target

Prerequisites: none

Data Science Statistics: Inferential Statistics

Course Number:
it_dssds1dj_03_enus
Lesson Objectives

Data Science Statistics: Inferential Statistics

  • Course Overview
  • draw the shape of a Gaussian distribution and enumerate its defining properties
  • enumerate the steps involved in hypothesis testing and define the null and alternative hypotheses
  • describe the role of test statistic and p-value in accepting or rejecting a null hypothesis
  • enumerate types and uses of t-tests in hypothesis testing
  • outline the significance of skewness and kurtosis and define the skewness and kurtosis of a normally distributed random variable
  • calculate the autocorrelation of a time series
  • define linear regression
  • interpret the R-squared of a regression and identify overfitting
  • differentiate between null and alternative hypotheses, enumerate four use cases for t-tests, and calculate the correlation of time series with itself

Overview/Description

In this Skillsoft Aspire course on data science, learners can explore hypothesis testing, which finds wide applications in data science. This beginner-level, 10-video course builds upon previous coursework by introducing simple inferential statistics, called the backbone of data science, because they seek to posit and prove or disprove relationships within data. You will start by learning steps in simple hypothesis testing: the null and alternative hypotheses, s-statistic, and p-value, as ach term is introduced and explained. Next, listen to an informative discussion of a specific family of hypothesis tests, the t-test. Then learn to describe their applications, and become familiar with how to use cases including linear regression. Learn about Gaussian distribution and the related concepts of correlation, which measures relationships between any two variables, and autocorrelation, a special form used in the concept of time-series analysis. In the closing exercise, review your knowledge by differentiating between the null and the alternative hypotheses in a hypothesis testing procedure, then enumerating four distinct uses for different types of t-tests.



Target

Prerequisites: none

Data Science Statistics: Simple Descriptive Statistics

Course Number:
it_dssds1dj_01_enus
Lesson Objectives

Data Science Statistics: Simple Descriptive Statistics

  • Course Overview
  • enumerate objectives of descriptive and inferential statistics and distinguish between the two
  • enumerate objectives of population and sample and distinguish between the two
  • enumerate objectives of probability and non-probability sampling and distinguish between the two
  • define the mean of a dataset and enumerate its properties
  • define the median and mode of a dataset and enumerate their properties
  • define the range of a dataset and enumerate its properties
  • define the inter-quartile range of a dataset and enumerate its properties
  • define the variance and standard deviation of a dataset and enumerate their properties
  • differentiate between inferential and descriptive statistics, enumerate the two most important types of descriptive statistics, and define the formula for standard deviation

Overview/Description

Along the career path to Data Science, a fundamental understanding of statistics and modeling is required. The goal of all modeling is generalizing as well as possible from a sample to the population of big data as a whole. In this 10-video Skillsoft Aspire course, learners explore the first step in this process. Key concepts covered here include the objectives of descriptive and inferential statistics, and distinguishing between the two; objectives of population and sample, and distinguishing between the two; and objectives of probability and non-probability sampling and distinguishing between them. Learn to define the average of a data set and its properties; the median and mode of a data set and their properties; and the range of a data set and its properties. Then study the inter-quartile range of a data set and its properties; the variance and standard deviation of a data set and their properties; and how to differentiate between inferential and descriptive statistics, the two most important types of descriptive statistics, and the formula for standard deviation.



Target

Prerequisites: none

Close Chat Live